085-2007: Text Mining and PROC KDE to Rank Nominal Data

نویسنده

  • Patricia B. Cerrito
چکیده

By definition, nominal data cannot be ranked. However, there are circumstances where it is essential to rank nominal data. Examples of such ranking include ranking hospitals and colleges, defining the “most livable cities”, and conference paper submissions. In this project, we consider ranking patient severity. The purpose is to determine how patient severity can be used to rank the quality of hospital performance. There are thousands of patient diagnoses and co-morbidities that make such a ranking very difficult. Generally, nominal variables have been ranked by using quantitative outcome variables. Currently, hospital quality measures used stepwise logistic regression to reduce the number of patient diagnoses considered to define a measure of patient severity. More recently, a weight-of-evidence method has been developed for predictive modeling such that nominal data are compressed and ranked using a target variable. However, there are now methods available that allow for ranking nominal data that do not require outcome variables; instead, outcome variables can be used to validate the ranking. Ranking can be done using SAS Text Miner to compress nominal data fields containing information on patient diagnoses, combined with PROC KDE to define and validate the patient severity ranking. It will be demonstrated that SAS Text Miner can define an implied ranking of nominal fields that is identified through the application of PROC KDE. Once the patient severity rank has been defined, it will be used to examine patient outcomes, and physician variability in patient outcomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistics to measure correlation for data mining applications

Correlation is usually used in the context of real-valued sequences but, in data mining, the values of fields may be of various types—real, nominal or ordinal. Techniques for measuring correlation between any two sequences of data are reviewed, regardless of their type. In particular, a new technique for measuring the correlation between real-valued data and nominal data is proposed. The techni...

متن کامل

Mining evolutionary dependencies from web-localization repositories

An approach to mining repositories of web-based user documentation for patterns of evolutionary change in the context of internationalization and localization is presented. Localized web documents that are frequently co-changed (i.e., an evolutionary dependency) during the natural language translation process are uncovered to support the future evolution of the system. A sequential-pattern mini...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Rank and select revisited and extended

The deep connection between the Burrows-Wheeler transform (BWT) and the socalled rank and select data structures for symbol sequences is the basis of most successful approaches to compressed text indexing. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol o...

متن کامل

An Investigation of Linkage in Nominal Data

The Internet provides access to vast volumes of nominal data (data containing names) collected for a range of different purposes (e.g. parish registers containing baptism, marriage, and burial records). To mine these data effectively methods must exist, that are aware both of the source and semantics of the data, as well as the types of linkage (relationships between records) that can exist. Fu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007